Effective Use of Dedicated Wide-Area Networks for High-Performance Distributed Computing
نویسندگان
چکیده
Recent advances in Grid technology have made it possible to build so-called computational Grids, or simply Grids, which couple unique or rare resources that are geographically separated and span multiple administrative domains. Such Grids are invariably composed of heterogeneous networks in which, at the least, a high-performance switch accommodates intracluster messages and a separate, sometimes dedicated, high-bandwidth network serving intersite messages across the wide area. While such wide-area networks provide unprecedented bandwidth capacity and reliability, the effective utilization of these networks remains an open challenge. Most applications by default use the TCP/IP protocol for its ease of use and reliability, but the high bandwidth and high latency sometimes found on these networks induce enormous bandwidth delay products that result in extremely large TCP congestion window sizes. This situation makes TCP a poor choice for data-intensive applications striving to achieve maximum bandwidth utilization on high-performance networks. To address this bandwidth utilization challenge for Grids connected over dedicated networks, we present a solution based on the UDP protocol with added reliability and the Message Passing Interface (MPI) standard. MPI provides an interface that allows application programmers to ignore network heterogeneity. To study the efficacy of our approach, we implemented our implementation of the Reliable-Blast UDP protocol in MPICHG2, our Grid-enabled MPI. We demonstrated this implementation in an MPI data-intensive Grid visualization application on the NSF TeraGrid and its dedicated high-bandwidth fiber optic network. We observed an improvement in aggregate bandwidth utilization from 58 Mbps with MPICH-G2 using TCP alone to 9 Gbps with our technique.
منابع مشابه
DisTriB: Distributed Trust Management Model Based on Gossip Learning and Bayesian Networks in Collaborative Computing Systems
The interactions among peers in Peer-to-Peer systems as a distributed collaborative system are based on asynchronous and unreliable communications. Trust is an essential and facilitating component in these interactions specially in such uncertain environments. Various attacks are possible due to large-scale nature and openness of these systems that affects the trust. Peers has not enough inform...
متن کاملApplying ATM to Distributed and High Performance Computing on Local and Wide Area Networks
Asynchronous Transfer Mode is becoming a widespread technology for both local and wide area networks. We describe our ATM-connected computing and storage resources at Adelaide and Canberra. We report on measurements of the performance of our system and discuss the implications for wide area distributed, highperformance computing (DHPC) applications. In particular we discuss e ects of bandwidth ...
متن کاملDisTriB: Distributed Trust Management Model Based on Gossip Learning and Bayesian Networks in Collaborative Computing Systems
The interactions among peers in Peer-to-Peer systems as a distributed collaborative system are based on asynchronous and unreliable communications. Trust is an essential and facilitating component in these interactions specially in such uncertain environments. Various attacks are possible due to large-scale nature and openness of these systems that affects the trust. Peers has not enough inform...
متن کاملLoad-Frequency Control: a GA based Bayesian Networks Multi-agent System
Bayesian Networks (BN) provides a robust probabilistic method of reasoning under uncertainty. They have been successfully applied in a variety of real-world tasks but they have received little attention in the area of load-frequency control (LFC). In practice, LFC systems use proportional-integral controllers. However since these controllers are designed using a linear model, the nonlinearities...
متن کاملApplying ATM to Distributed and High PerformanceComputing on Local and Wide Area
Asynchronous Transfer Mode is becoming a widespread technology for both local and wide area networks. We describe our ATM-connected computing and storage resources at Adelaide and Canberra. We report on measurements of the performance of our system and discuss the implications for wide area distributed, high-performance computing (DHPC) applications. In particular we discuss eeects of bandwidth...
متن کامل